README File for "Increasing the Demand for Workers with a Criminal Record"

OVERVIEW
========================================
This replication package contains data files and code used to generate the analysis data, figures, tables, and statistics from "Increasing the Demand for Workers with a Criminal Record." The code is divided into five files that run on Stata:
- 0_analysis_data.do
- 1_mainfigures.do
- 2_maintables.do
- 3_appendixfigures.do
- 4_appendixtables.do
- 5_additionalresults.do 
The replicator should expect the code to run for about 8 hours.

DATA AVAILABILITY AND PROVENANCE
========================================
- Survey and administrative data on firm and manager characteristics for subjects on the Platform came from the Platform. This data cannot be made publicly available.
- Data on unemployment is from the the U.S. Bureau of Labor Statistics and U.S. Census Current Population Survey. All of this data is publicly available.
- Data about COVID-19 cases at the county level were collected by The New York Times and can be downloaded from: https://github.com/nytimes/covid-19-data
- Additional data on firm characteristics come from Infogroup (now known as Data Axle) and can be purchased from: https://www.data-axle.com/our-data/business-data/

DATA FILE DESCRIPTIONS
========================================
The following section describes the contents of data files referenced in the .do files but not included in the replication package.
Files:
- infogroup_universe.dta: Contains data from Infogroup on firms in the United States. Used to create balance table comparing firm characteristics.
- main_survey_wide.dta: Contains survey data where each observation represents a manager's survey response. 
	- Variables whose name signify a month-year or range of months (e.g. since_feb, all2020, may2019) are variables indicating the number of jobs posted by the respondent's firm in that date range. 
	- multiday_count is the total number of jobs posted by the respondent's firm that are multiday 
	- approved_jobs is the total number of jobs that the firm has approved on the platform from late 2018 to early 2020 
	- exp is the respondent's stated number of years experience hiring 
	- has_job is indicator for the firm having posted any job on the Platform in 2018-2020 according to our admin data from the Platform 
	- local_ind_ue_rate is the local industry-specific unemployment rate
	- local_low_skill_ue_rate is the local unemployment rate for people with less than a high school degree
	- met_ue_rate is the metropolitan area's unemployment rate
	- county_ue_rate is the county unemployment rate
	- cust_int indicates whether the respondent says that the jobs posted by their firm involves customer interaction
	- high_val indicates whether the respondent says that the jobs posted by their firm involves handling high value inventory
	- estab_age is the age of the firm 
	- estab_year is the year the firm was established
	- empsize is the number of employees that the firm employs
	- shrm_* indicates whether the respondent says their firm has the corresponding SHRM hiring policy for workers with a criminal record
	- ln_prior is ln(performance_percent_pre)
	- ln_posterior is ln(performance_percent_post)
	- ln_cases_pc is ln(COVID cases per population)
	- median_cases_pc is an indicator for being above the median cases of COVID per population among surveyed respondents
	- quartile_cases_pc is variable for quartile in terms of cases of COVID per population 
	- state_of_emergency is an indicator for the respondent completing survey after the COVID state of emergency was declared
	- info_type is an indicator for which of two information treatments the respondent was given (1 is for giving information on high performance of workers with criminal records and 2 if for giving information on low performance).
	- norm_perception_update is the difference between ln(Posterior) and ln(Prior) Beliefs for info_type == 1 and -(difference between ln(Posterior) and ln(Prior) Beliefs) for info_type == 2
	- norm_perception_gap if the difference between ln(true high performance rate) and ln(Prior) Beliefs for info_type == 1 and -(difference between ln(true low performance rate) and ln(Prior) Beliefs) for info_type == 2
	- ins_cap specifies the level of insurance proposed to the respondent when asked if they would hire WCs if the platform offered crime and safety insurance up to the cap
	- years_elapsed specifies the year restriction proposed to the respondent when asked if they would hire WCs if the platform restricts to WCs where the specified number of years has elapsed since the WCs' last convictions
	- past_jobs specifies the number of past completed jobs on the Platform proposed to the respondent when asked if they would hire WCs if they have completed at least the specified numbers of jobs on the Platform  
- main_survey_long.dta: Contains survey data in long form. An individual appears once for their baseline response and then additional times for their responses under main treatment conditions. Variable definitions similar to those in main_survey_wide.dta.
	- hire is indicator for would hire
	- baseline is indicator for baseline response at the specified subsidy level with no additional treatment
- main_survey_longcrime.dta: Contains survey data in long form, but where each individual appears once for their baseline response with subsidy (question == "sub") and then additional times for their responses to whether they would hire workers with criminal records for different types of criminal convictions.
	- question specifies whether the response is for the baseline response under subsidy (sub) or for a worker who was convicted for a particular type of crime. 
		- *_fel indicates felony
		- *_mis indicates misdemeanor
- platform_infogroup.dta: Contains data merged between the Infogroup universe dataset with the platform universe dataset. Used to identify characteristics of firms on the platform that are in the Infogroup data (e.g. firm age). Keep in mind that it is not a perfect merge since it was merged based on firm name and state.
	- surveyed indicates whether a survey respondent is part of the firm
- platform_universe.dta: Contains data on all firms on the platform. 
- reweight_by_infogroup_industry_firm.dta: Long form dataset where each row contains a weight. 
	- "reweight_shrm" indicates that the "weights" variable contains the weight calculated based on marginal distributions of SHRM firm policies
	- "reweight_infogroup indicates that the "weights" variable contains the weight calculated based on marginal distributions of firm age and number of employees
	- subsample indicates that the manager with the "mgr_id" is in the subsample with nonmissing characteristics that was used to calculate "reweight_shrm" and "reweight_infogroup".
- SHRM.xlsx: Contains statistics from SHRM on WC hiring policies among firms they surveyed. Used as part of descriptive statistics table.
- us_naicssector_large_emplsize_2019.xlsx: Contains statistics on detailed employment sizes in 2019 by industry sector. From US Census 2019 County Business Patterns.

COMPUTATION REQUIREMENTS
========================================
- Stata (code was last run with version 16)
	- Necessary dependencies are installed by the header of each .do file.

Controlled Randomness: Random seeds are set at line 33 and 34 of program 4_appendixtables.do.

Memory and Runtime Requirements: The code was last run on a M1 laptop with 16 GB of RAM running MacOS version 12.0.1. Computation took 8 hours.

Program Descriptions:
=====================================================================================
Program Name: 0_analysis_data.do
Created: July 2021
Edited: July 1, 2022 
Purpose: Cleans and creates analysis data.
What to Change: 
- Open the do file by clicking on it rather than opening it through Stata
- If opened through Stata, set working directory to the Replication folder that the do file is in
=====================================================================================
Program Name: 1_mainfigures.do
Created: July 2021
Edited: July 1, 2022 
Purpose: Creates all the main figures for the paper. 
What to Change: 
- Open the do file by clicking on it rather than opening it through Stata
- If opened through Stata, set working directory to the Replication folder that the do file is in
=====================================================================================
Program Name: 2_maintables.do
Created: July 2021
Edited: July 1, 2022 
Purpose: Creates all the main tables for the paper, including summary statistics and regression results tables. 
What to Change: 
- Open the do file by clicking on it rather than opening it through Stata
- If opened through Stata, set working directory to the Replication folder that the do file is in
=====================================================================================
Program Name: 3_appendixfigures.do
Created: June 2022
Edited: July 1, 2022 
Purpose: Creates all the appendix figures for the paper. 
What to Change: 
- Open the do file by clicking on it rather than opening it through Stata
- If opened through Stata, set working directory to the Replication folder that the do file is in
=====================================================================================
Program Name: 4_appendixtables.do
Created: June 2022
Edited: July 1, 2022 
Purpose: Creates all the appendix tables for the paper.
What to Change: 
- Open the do file by clicking on it rather than opening it through Stata
- If opened through Stata, set working directory to the Replication folder that the do file is in
=====================================================================================
Program Name: 5_additionalresults.do
Created: August 2021
Edited: July 1, 2022 
Purpose: Calculates statistics reported in the paper that are not in the tables or figures. 
What to Change: 
- Open the do file by clicking on it rather than opening it through Stata
- If opened through Stata, set working directory to the Replication folder that the do file is in
=====================================================================================


List of Tables and Programs:
====================================================================================================
Figure/Table #			Program			Line Number		Output File
====================================================================================================
Figure 1 			1_mainfigures.do 	151			output_figures/f1_baseline_wagesubsidies.pdf
Figure 2 			1_mainfigures.do 	218			output_figures/f2a_labormarket_jobchar_nosub.pdf
											output_figures/f2b_labormarket_jobchar_full.pdf
Figure 3 			1_mainfigures.do 	476			output_figures/f3a_insurance_jobhistory_screen_nosub.pdf
											output_figures/f3b_insurance_jobhistory_screen_full.pdf
Figure 4 			1_mainfigures.do 	721			output_figures/f4a_crime_type_nosub.pdf
											output_figures/f4b_crime_type_full.pdf
Figure 5 			1_mainfigures.do 	866			output_figures/f5a_posteriorbeliefshighperf.pdf
											output_figures/f5b_posteriorbeliefslowperf.pdf

Table 2 			2_maintables.do 	70			output_tables/t2_summarystats.tex
Table 3 			2_maintables.do 	283			output_tables/t3_balance.tex
Table 4 			2_maintables.do 	395			output_tables/t4_ivtable.tex

Appendix Figure 1 		3_appendixfigures.do 	151			output_figures/xf1_demand_curve.pdf
Appendix Figure 2 		3_appendixfigures.do 	218			output_figures/xf2a_labormarket_nosub.pdf
											output_figures/xf2b_labormarket_full.pdf
Appendix Figure 3 		3_appendixfigures.do 	490			output_figures/xf3a_insurance_demand_curve.pdf
											output_figures/xf3b_jobhistory_demand_curve.pdf
											output_figures/xf3c_limitedscreening_demand_curve.pdf
Appendix Figure 4 		3_appendixfigures.do 	739			output_figures/xf4a_insurance_levels_nosub.pdf
											output_figures/xf4b_insurance_levels_full.pdf
Appendix Figure 5 		3_appendixfigures.do 	864			output_figures/xf5a_performance_history_nosub.pdf
											output_figures/xf5b_performance_history_full.pdf
Appendix Figure 6 		3_appendixfigures.do 	991			output_figures/xf6a_record_screening_nosub.pdf
											output_figures/xf6b_record_screening_full.pdf
Appendix Figure 7 		3_appendixfigures.do 	1116			output_figures/xf7a_perceptionupdatelowperf.pdf
											output_figures/xf7b_perceptionupdatehighperf.pdf
											output_figures/xf7c_ivestimateslowperf.pdf
											output_figures/xf7d_ivestimateshighperf.pdf
Appendix Figure 8 		3_appendixfigures.do 	1394			output_figures/xf8a_covid_nosub.pdf
											output_figures/xf8b_covid_full.pdf

Appendix Table 2 		4_appendixtables.do 	386			output_tables/xt2_summarystats_platform_firms.tex
Appendix Table 3 		4_appendixtables.do 	583			output_tables/xt3_demanddescriptives.tex
Appendix Table 4 		4_appendixtables.do 	734			output_tables/xt4_treatments.tex
Appendix Table 5 		4_appendixtables.do 	850			output_tables/xt5_convictions.tex
Appendix Table 6 		4_appendixtables.do 	1182			output_tables/xt6_contamination_bias.tex
Appendix Table 7 		4_appendixtables.do 	1362			output_tables/xt7_robustness.tex
Appendix Table 8 		4_appendixtables.do 	1732			output_tables/xt8_robustness_convictions.tex